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Abstract. Ongoing observations of the Cosmic Microwave Background, such as the MAXIMA and 
BOOMERanG projects, are providing datasets of unprecedented quality and ever-increasing size. Exact anal- 
ysis of the data they produce is a serious computational challenge, currently scaling as the number of sky 
pixels squared in memory and cubed in time. Here we discuss the origins of these scaling relations and their 
implications for our efforts to extract precise cosmological parameters from observations of the CMB. 



INTRODUCTION 

The Cosmic Microwave Background is the most distant observation of photons we can ever make. Last scattered 
only 300,000 years after the Big Bang it provides a unique picture of the state of the universe at that time. In 
particular, fluctuations in the CMB directly trace the primordial density perturbations and so provide a powerful 
discriminant between alternative cosmologies of the very early universe. As a result the search for anisotropics in 
the CMB has been a cornerstone of cosmology for the last 30 years. 

Finally measured by the COBE satellite, the anisotropics proved to be of the order of only one part in a million 
on a 3K background whose uniformity was otherwise only broken by a dipole induced by the peculiar velocity of 
the galaxy of the order of one part in a thousand. Despite the tiny scale of these fluctuations, advances in detector 
technologies have enabled us to consider measuring them to the extraordinary accuracy and resolution necessary to 
determine the fundamental parameters of cosmology to better than 1% [1]. 

Such measurements include those of the MAXIMA and BOOMERanG projects — described in detail by Lee et al, 
and Masi et al and de Bernardis et al elsewhere in these Proceedings. These balloon-borne observations have already 
produced datasets an order of magnitude larger than their predecessors, and in subsequent flights will at least double 
this size. Beyond this, the MAP and PLANCK satellite missions will yield datasets 1-2 orders of magnitude larger 
again. The shear size of these datasets makes their analysis a serious computational challenge. It is this challenge, 
and the current status of our attempts to address it, that are discussed here. 

For simplicity we only consider the highly idealised case of extracting a power spectrum of Afi multipoles in Afb 
bins from a map of Af p pixels obtained from a single time-ordered sequence of Aft observations of the sky, the data 
only comprising CMB signal and Gaussian noise. In practice there are many additional sources of non-Gaussian 
contamination (in particular both galactic and extra-galactic foreground sources) making observations necessary at 
a range of frequencies to allow for their subtraction. 
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FROM THE TIME-ORDERED DATA TO THE MAP 



Formalism 



Our first step is to translate the observation from the temporal to the spatial domain — to make a map [2] (see 
also Jaffc et al elsewhere in these Proceedings). Knowing where the detector was pointing at each observation, 
(Ot, 4>t), and adopting a particular pixelization of the sky, we can construct a pointing matrix A tp whose entries give 
the weight of pixel p in observation t. For scanning experiments such as MAXIMA and BOOMERanG this has a 
particularly simple form 

A tn = {\ i *J* t 't') ep (1) 



tp — [ otherwise 

while other observing strategies would give a more complex structure. The data vector can now be written 

d = As + n (2) 

in terms of the pixelised CMB signal s and time-stream noise n. 

Under the assumption of Gaussianity, the noise probability distribution is 



(3) 



P(n) = (27r)- jVt/2 exp|-i (r^N^n + Tr [fnTV])} 

where Af is the time-time noise correlation matrix given by 

N = (nn T ) (4) 

We can now use equation (2) to substitute for the noise in equation (3), giving the probability of the data for a 
particular CMB signal as 



P(d\s) = (2ir)-^/ 2 exp j-i ({d - AsfN'^d - As) + Tr [In AT]) J 



(5) 



Assuming that all CMB maps are a priori equally likely, this is proportional to the likelihood of the signal given the 
data, and maximizing over s gives the maximum likelihood map m 

m=(A T N- 1 Ay 1 A T N- l d (6) 

Substituting back for the time-ordered data in this map we recover the obvious fact that it is the sum of the true 
CMB signal and some pixelized noise 

m= (A T N- 1 Ay 1 A T N- 1 {As + n) 

= s + v (7) 



where this pixel noise 



has correlations given by 



v= (A T N- 1 AY 1 A T N- 1 n (8) 



T = ( W T) 

= (A T N- 1 A)~ 1 (9) 



TABLE 1. Computational requirements for the map- making algorithm 
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Computational Requirements 

Making the map requires solving equation (6) which is conveniently divided into three steps: 

T" 1 = A T N- 1 A 
z = A T N~ 1 d 

and m=(T- 1 )- 1 z (10) 

The first half of table 1 shows the computational cost of a brute force approach to each of these steps (recall that 
multiplying an [a x b] matrix and an [b x c] matrix involves 2 x a x b x c operations). Thus for datasets such as 
MAXIMA-1 or BOOMERanG North America, with Af t ~ 2 x 10 6 and Af p ~ 3 x 10 4 , making the map would require 
of the order of 16 Tb of disc space (storing data in 4-byte precision), 7 Gb RAM 3 (doing all calculations in 8-byte 
precision) and 2.4 x 10 17 operations. If we were able to use a fast 600 MHz CPU at 100% efficiency this would still 
correspond to over 12 years of run time. 

Fortunately there are two crucial structural features to be exploited here. As noted above the pointing matrix A 
is usually very sparse, with only J\f a non-zero entries in each row. For simple scanning strategies such as MAXIMA, 
BOOMERanG and PLANCK, Af a = 1, with a single 1 in the column corresponding to the pixel being observed. 
For a differencing experiment such as COBE or MAP, Af a = 2, with a ±1 pair in the columns corresponding to the 
pixel pair being observed. Moreover, the inverse time-time noise correlations are (by fiat) both stationary and fall 
to zero beyond some time-separation much shorter than the duration of the observation 

N-'=f(\t-t'\) 

= v > t<m (n) 

so that the inverse time-time noise correlation matrix is symmetric and band-diagonal, with bandwith J\f T = 2t + 1 . 
The second half of table 1 shows the impact of exploiting this structure on the cost of each step. The limiting 
step is now no longer constructing the inverse pixel-pixel noise correlation matrix but solving for the map, which is 
unaffected by these features. For the same datasets making the map now takes of the order of 3.6 Gb of disc, 7 Gb 
of RAM, and 7 x 10 13 operations, or 32 hours of the same CPU time. 

Further acceleration of the map-making algorithm must therefore focus on a faster solution the final step, inverting 
the inverse pixel-pixel noise covariance matrix T _1 to obtain the map. However, as we shall see below, even this is 
not the limiting step overall in current algorithms. 

FROM THE MAP TO THE POWER SPECTRUM 
Formalism 

We now want to move to a realm where the CMB observation can be compared with the predictions of various 
cosmological theories — typically the angular power spectrum. We decompose the CMB signal at each pixel in 
spherical harmonics 

s p = ^2 a hn BiYi m (9 p , V> p ) (12) 

lm 



' ' Although it is possible to use out-of-core algorithms for operations such as matrix inversion the associated time overhead 
would be prohibative. We therefore assume that all such operations are carried out in core. 



where B is the pattern of the observation beam (assumed to be circularly symmetric) in ^-space. The correlations 
between such signals then become 



S pp > = (s p s p ,) = 'y]{ai m ai>m')BiBi l Yi m (6 p ,ipp)Yi lm i(6p/,ipp>) (13) 

Ira I'm' 

For isotropic fluctuations the correlations depend only on the angular separation 

(dlmdl'm') = Q5 U i5 mm > (14) 

and the pixel-pixel signal correlation matrix becomes 

2/ 4- 1 

s PP '^T,^r c ^ PP ') (15) 

where Pi is the Legendre polynomial and Xpp' the angle between the pixel pair p,p'. These Ci multipole powers 
completely characterise a Gaussian CMB, and are an otherwise model-independent basis in which to compare theory 
with observations. In general, due to incomplete sky coverage and low signal-to-noise, we are unable to extract each 
multipole moment independently We therefore group the multipolcs in bins, adopting a particular spectral shape 
function Cf and characterising the CMB signal by its bin powers C b with 

d = C b:leb Cf (16) 

Since the signal and noise are assumed to be realisations of independent Gaussian processes the pixel-pixel map 
correlations are 

Mppi = (mrriT) 

= (ss T ) + (yv T ) 

= S + T (17) 
and the probability distribution of the map given a particular power spectrum C is now 

P(m\C) = (27r)- jV "/ 2 cxp|-i {rr^M^m + Tr [In M])| (18) 

Assuming a uniform prioir for the spectra, this is proportional to the likelihood of the power spectrum given the map. 
Maximizing this over C then gives us the required result, namely the most likely CMB power spectrum underlying 
the original observation d. 

Finding the maximum of the likelihood function of equation (18) is generically a much harder problem than 
making the map. Since there is no closed-form solution corresponding to equation (6) we must find both a fast way 
to evaluate the likelihood function at a point, and an efficient way to search the A^-dimensional parameter space for 
the peak. The fastest general method extant is to use Newton-Raphson iteration to find the zero of the derivative 
of the logarithm of the likelihood function [3] . If the log likelihood function 

C(C) = ~ (m T M- 1 m + Tr [In M]) (19) 

were quadratic, then starting from some initial guess at the maximum likelihood power spectrum C the correction 
SC that would take us to the true peak would simply be 




SCo = -[ ^ — ) (20) 

C=C 

Since the log likelihood function is not quadratic, we now take 

d = C + SC (21) 

and iterate until 5C n <~ to the desired accuracy. Because any function is approximately quadratic near a peak, if 
we start searching sufficiently close to a peak this algorithm will converge to it. Of course there is no guarantee that 
it will be the global maximum, and in general there is no certainty about what 'sufficiently close' means in practice. 
However experience to date suggests that the log likelihood function is sufficiently strongly singley peaked to allow 
us to use this algorithm with some confidence. 



TABLE 2. Computational requirements for each iteration of the maximum 
likelihood power spectrum extraction algorithm 
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Computational Requirements 



The core of the algorithm is then to calculate the first two derivatives of the log likelihood function with respect 
to the multipole bin powers 
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Evaluating these derivatives comes down to solving the MbM p + 1 linear systems 



and 



z = M~ 1 m 



(22) 
(23) 

(24) 
(25) 



and assembling the results 



§ b =l(m-W b z-Tr { W b] ) 

= ~m T W h W v z + \Tr [W b W v 



(26) 



Table 2 shows the computational cost of these operations, where solving the linear systems has been optimised 
by first Cholesky decomposing the matrix M . Solving equation (20) has been excluded since its cost is entirely 
negligible, depending only on the number of multipole bins A/& -C M p . Obtaining the maximum likelihood power 
spectrum for the same datasets as above, with N p ~3x 10 4 and A4 ~ 20, then requires of the order of 150 Gb disc, 
14 Gb of RAM, and 10 15 operations per iteration, or 21 days of our 600 MHz CPU time. 

Such numbers are at least conceivable; however, as shown in table 3, the scaling with map size pushes forthcoming 
balloon observations well beyond the capacity of the most powerful single processor machine — and even allowing 
for the continued doubling of computer power every 18 months predicted by Moore's 'law' we would still have to wait 
20 years for a serial machine capable of handling the BOOMERanG Long Duration Balloon flight data. Moreover, 
these timings are for a single iteration (and typically the alogorithm needs O(10) iterations to converge) for a single 
run through the dataset, although undoubtedly we will want to perform several slightly different runs to check the 
robustness of our analysis. 

One way to increase our capability now is to move to parallel machines, such as the 640-processor Cray T3E at 
NERSC. Since the algorithm is limited by matrix-matrix operations (Cholesky decomposition and triangular solves) 
we are able to exploit the most optimised protocols — the level 3 BLAS — and the DEC Alpha chips' capacity to 
perform an add and a multiply in a single clock cycle. Coupled with a fincly-tuncd dense packing of the matrices on 
the processors this has enabled us to sustain upwards of 2/3 of the theoretical peak performance of 900MHz. This 
enables us to increase the limiting datasize to around 80,000 pixels. 



FUTURE PROSPECTS 



We have seen that existing algorithms are capable of dealing with CMB datasets with at most 10 5 pixels. Over 
the next 10 years a range of observations are expected to produce datasets of 5 x 10 5 (BOOMERanG LDB), 10 6 



TABLE 3. The computational requirements for one iteration of the Newton-Raphson algorithm to 
extract a 20-bin power spectrum for MAXIMA and BOOMERanG 
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(MAP) and 10 7 (PLANCK) pixels that will necessarily require new techniques. This is an ongoing area of research 
in which some progress has been made in particular special cases. 

The limiting steps in the above analysis are associated with operations involving the pixel-pixel correlation matrices 
for the noise T, the signal S, and most particularly their sum M. The problem here is the noise and the signal have 
different natural bases. The inverse noise correlations are symmetric, band-diagonal and approximately circulant in 
the time domain, while the signal correlations are diagonal in the spherical harmonic domain. However there is no 
known basis in which the map correlations take a similarly simple form. 

If the noise is uncorrelated between pixels and is azimuthally symmetric — as has been argued will be the case 
for the MAP satellite due to its chopped observing strategy — then the pixel-pixel data correlation matrix can be 
dramatically simplified, reducing the analysis to OiAfp^ 2 ) in storage and O(Afp) in operations [4]. Although some 
work has also been done extending this to observations with marginal azimuthal asymmetry it is still inapplicable for 
the spatially correlated noise inherent to the simple scanning strategies adopted by MAXIMA and BOOMERanG 
(which also face the problem of irregular sky coverage) or PLANCK; for such observations the problem remains 
unsolved. 
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